rank | frequency | n-gram |
---|---|---|
1 | 32406 | -a |
2 | 19763 | -e |
3 | 18514 | -i |
4 | 12116 | -m |
5 | 11727 | -u |
rank | frequency | n-gram |
---|---|---|
1 | 6690 | -om |
2 | 5026 | -na |
3 | 4516 | -je |
4 | 4482 | -ja |
5 | 4041 | -ma |
rank | frequency | n-gram |
---|---|---|
1 | 2221 | -ima |
2 | 2160 | -ija |
3 | 1812 | -nje |
4 | 1790 | -ije |
5 | 1725 | -nom |
rank | frequency | n-gram |
---|---|---|
1 | 1206 | -anje |
2 | 1059 | -skog |
3 | 1016 | -anja |
4 | 784 | -skom |
5 | 765 | -skih |
rank | frequency | n-gram |
---|---|---|
1 | 484 | -vanje |
2 | 445 | -nosti |
3 | 427 | -anjem |
4 | 424 | -vanja |
5 | 402 | -acija |
The tables show the most frequent letter-N-grams at the ending of words for N=1…5. Everything runs in parallel to 2.2.5 Most frequent word beginnings. The aim is suffix detection instead of affix detection.
For N=3:
SELECT @pos:=(@pos+1), xx.* from (SELECT @pos:=0) r, (select count(*) as cnt ,concat("-", right(word,3)) FROM words WHERE w_id>100 group by right(word,3) order by cnt desc) xx limit 5;
2.2.5 Most frequent word beginnings